Background
This process uses an offline geocoding database hosted on a docker image on the user’s computer. The geocoding database is called Nominatim and is developed by the Open Streets Map Foundation.
Benefits of this approach
- It’s free
- It is done entirely behind our local firewall, so protected health information (PHI) does not goes out to the internet
- It can be called using R, allowing for reproducibility and direct integration with other data analysis processes
Disadvantages of this approach
- Relatively involved setup process
- I’m not sure how accuracy compares to Google or ArcGIS
Prerequisites
Install R and R Studio.
This assumes you are running windows, for a link to the install page please see this site.Install the
tmaptoolspackage
After you’ve installed R and Rstudio, open Rstudio and type the following into the console and press enter:
install.packages("tmaptools")
You only need to install tmaptools once.
Online Geocoding in R
For example data, we’ll use the Rhode Island state capitol. This is what I put in R:
RI_state_capitol <- "82 Smith St, Providence, RI 02903"Then I’m using the the geocode_OSM() function from the
tmaptools function to geocode
#this loads the library, you have to do this every time you restart R
library(tmaptools)
tmaptools::geocode_OSM(RI_state_capitol)## $query
## [1] "82 Smith St, Providence, RI 02903"
##
## $coords
## x y
## -71.41496 41.83090
##
## $bbox
## xmin ymin xmax ymax
## -71.41558 41.83064 -71.41434 41.83115
This works, yay! The problem is that the default is to send the
address to the online server.
From the
documentation:
geocode_OSM(
q,
projection = NULL,
return.first.only = TRUE,
keep.unfound = FALSE,
details = FALSE,
as.data.frame = NA,
as.sf = FALSE,
geometry = c("point", "bbox"),
server = "https://nominatim.openstreetmap.org"
)
But there’s also the option to send the addresses to a local Nominatim server:
server OpenStreetMap Nominatim server name. Could also be a local OSM Nominatim server
Mediagis/Nominatim Docker image
Open Street Maps allows people to download and setup an offline database of addresses for geocoding. However, the nominatim documentation doesn’t have install instructions for windows, so I decided the easiest thing to do would be to run nominatim as a docker container. Here’s how I did that:
I installed Docker following these instructions.
Then I followed these instructions to setup a nominatim docker container and entered the following in the Windows command line prompt:
docker run -it --rm -e PBF_URL=https://download.geofabrik.de/north-america/us/rhode-island-latest.osm.pbf -e REPLICATION_URL=https://download.geofabrik.de/north-america/us-northeast-updates/ -p 8080:8080 --name nominatim mediagis/nominatim:4.0
I only want to geocode addresses in Rhode Island, so that’s why I used the urls that I did. A bunch of stuff flashed across the cmd, and then I got this as the last line:
database system is ready to accept connections
Offline Geocoding R
I closed the command prompt and went back to R. In R I changed the
server= argument in geocode_OSM() to
server = "http://localhost:8080/" which corresponds to the
local port I chose for the nominatim docker container. I disconnected
from the internet and tested the offline local Nominatim server with
several addresses:
# State Capital
tmaptools::geocode_OSM(RI_state_capitol, server = "http://localhost:8080/")## $query
## [1] "82 Smith St, Providence, RI 02903"
##
## $coords
## x y
## -71.41496 41.83090
##
## $bbox
## xmin ymin xmax ymax
## -71.41558 41.83064 -71.41434 41.83115
#Newport Tennis Museum
tmaptools::geocode_OSM("194 Bellevue Ave, Newport, RI 02840", server = "http://localhost:8080/")## $query
## [1] "194 Bellevue Ave, Newport, RI 02840"
##
## $coords
## x y
## -71.30837 41.48278
##
## $bbox
## xmin ymin xmax ymax
## -71.30842 41.48273 -71.30832 41.48283
#Massachusetts State Capital
tmaptools::geocode_OSM("24 Beacon St, Boston, MA 02133", server = "http://localhost:8080/")## No results found for "24 Beacon St, Boston, MA 02133".
## NULL
It works! I can geocode offline for addresses in Rhode Island. It doesn’t work for Massachusetts addresses, but that was by design.
Batch Geocoding
Multiple addresses can also be geocoded all at once
lots_of_addresses <- c(RI_state_capitol,"194 Bellevue Ave, Newport, RI 02840","24 Beacon St, Boston, MA 02133")
lots_of_points <- tmaptools::geocode_OSM(lots_of_addresses, server = "http://localhost:8080/")## No results found for "24 Beacon St, Boston, MA 02133".
DT::datatable(lots_of_points, options = list(pageLength = 3, scrollX = TRUE))This could also come from a spreadsheet with a column of addresses
Maps
Then we can use all the points to make maps or do whatever else we want
# Here's a library we might need
# If this library isn't already installed you need to run install.packages("tmap)
library(tmap)
tmap_mode("view")## tmap mode set to interactive viewing
#here's a map of Rhode Island
library(sf)
# the shapefile comes from this website: https://www.rigis.org/datasets/edc::state-boundary-1997/about
ri <- st_read("https://services2.arcgis.com/S8zZg9pg23JUEexQ/arcgis/rest/services/State_Boundary_1997/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson")## Reading layer `OGRGeoJSON' from data source
## `https://services2.arcgis.com/S8zZg9pg23JUEexQ/arcgis/rest/services/State_Boundary_1997/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson'
## using driver `GeoJSON'
## Simple feature collection with 354 features and 4 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -71.89239 ymin: 41.14656 xmax: -71.12052 ymax: 42.01888
## Geodetic CRS: WGS 84
tm_shape(ri)+ tm_fill(col = "green", alpha = 0.2)+tm_layout(frame = FALSE)#Now let's put the points on the map
random_points_sf <- st_as_sf(lots_of_points, coords = c("lon","lat"), crs= st_crs(ri))
#here's dynamic map
tm_shape(ri)+
tm_fill(col = "green", alpha = 0.2)+
tm_shape(random_points_sf)+
tm_dots(col = "red")+
tm_layout(frame = FALSE,
outer.margins = c(0.1,0.1,0.1,0.1),
title = "My favorite places in Rhode Island")#and a static map
tmap_mode("plot")## tmap mode set to plotting
tm_shape(ri)+
tm_borders()+
tm_shape(random_points_sf)+
tm_dots(col = "blue", size = 2)+
tm_layout(frame = FALSE,
inner.margins = c(0.1,0.1,0.1,0.1),
title = "My favorite places in Rhode Island",
title.position = c(0,0.95))FAQ and Troubleshooting
Q: When I try to run the Docker container I get an error about self-signed certificates
A: Abandon all hope~ I think it’s a problem with how IT setup our firewall and I haven’t figured out a workaround. I got that message for several weeks on the work computer. And then all of the sudden it started working. 🤷
Q: What is Docker?
A: I’m honestly not super sure either. It is a way to distribute software that isn’t specific to any operating system. We’re using it here because the nominatim offline database is optimized to be setup on a computer running linux. Here’s a link to their description page: https://docs.docker.com/get-started/overview/
Q: Are there security or privacy risks to this approach
A: All software, including enterprise software from large companies such as Microsoft, may potentially have security vulnerabilities. Docker and Open Street Maps/nominatim are well established softwares, used and trusted by millions of users and organizations.
All the data stays behind our local firewall and is geocoded offline.